This study investigates the distribution of Dormibacterota, a phylum of bacteria, across NEON sites. It also investigates the taxonomic breakdown at a specific site, Guanica State Forest. Dormibacterota are currently poorly understood due to their low abundance compared to other bacterial phyla. This project displays taxonomic breakdowns, using R as a data analysis tool, that provides insight into the abundance of Dormibacterota across different sites and the diversity within the tropical forest ecosystem that is Guanica State Forest. The analysis displayed variations in Dormibacterota composition among different sites, suggesting that this phylum prefers certain environmental conditions. Within Guanica State Forest, distinct taxonomic profiles were observed with relatively low diversity at that site alone. Lastly, this project provides valuable insights into the ecological roles of Dormibacterota in various ecosystems and to the colonization that is possible in tropical forest ecosystems.
The motivating factors for this experiment was to analyze the diversity of taxonomic groups at Guanica State Forest. Knowing the diversity of taxonomic groups can allow researchers to understand what species can grow in certain areas around the world depending on the geography of the area. Additionally, analyzing the sites in which Dormibacterota was found allows researchers to understand where this certain type of species prefers to live in or thrives in. Learning data analysis, using outside data, provides a great skill to early researchers that will only make better scientists in the long run. Data analysis also allows researchers to recognize patterns which can hint at relationships between environments and organisms living there. Doing research like this, also provides researchers with a way of knowing where species rich environments are so that they can be protected and used for further research. ## Introduction Guánica State Forest is a subtropical dry forest in southwest Puerto Rico. It is the best preserved dry forest in the Caribbean. It has a warm climate with two rainy/ hurricane seasons. It is home to over 700 species of plants that are divided into three groups: deciduous forest, semi-evergreen forest, and scrub forest. Its most famous plant is a guaiac wood tree that could be as old as 1,000 years. This site is home to multiple different ecosystems including beaches, coral reefs, salt flats, mangrove forests, and limestone caverns (Sotomayor-Mena and Rios-Velazquez (2020)). Half of Puerto Rico’s birds occur in the Guánica State Forest and it is one of the few habitats where the Cook’s pallid anole (lizard species) can be found. This forest has both marine and terrestrial wildlife, including coral reefs, birds, grasshoppers, ants, etc. Dormibacterota is uncultured bacteria that is normally found in cold deserts and are a phylum of oligotrophic bacteria that live under the soil. They are known for their survival mechanisms that allow them to survive under starvation conditions. They are thought to be aerobic heterotrophs and based on genome analysis, they have been found to synthesize, store, and break down glycogen (Montgomery et al. (2021)). This phylum of bacteria is not very well researched since they are most commonly found in extremely cold environments. There is a lot of ongoing research that is looking into Dormibacterota phylogenetic relationships and their contribution to the environment in which they live.
Data Acquisition and Preparation: Data Collection: Taxonomic data can be obtained from various sources such as biodiversity databases, field surveys, or existing literature. I obtained my data from the National Ecological Observatory Network (NEON). Data Cleaning and Formatting: Clean the NEON data to remove any inconsistencies, missing values, or errors. Ensure that the data is formatted correctly for analysis in R Studio by making the data sets a workable size and only containing the columns that I wanted to analyze. Data Exploration and Visualization Exploratory Data Analysis: Explored the taxonomic data to understand its structure, distribution, and characteristics. I used histograms, bar graphs, and box plots to visualize these features. Software and Packages R Studio: Performed all analyses using R. Pushed projects to GitHub for storage and collaboration. R Packages: Utilize various R packages for taxonomic analysis such as tidyverse, ggtree, and data.table.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
library(ggtree)
## ggtree v3.10.1 For help: https://yulab-smu.top/treedata-book/
##
## If you use the ggtree package suite in published research, please cite
## the appropriate paper(s):
##
## Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam.
## ggtree: an R package for visualization and annotation of phylogenetic
## trees with their covariates and other associated data. Methods in
## Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
##
## Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods
## for mapping and visualizing associated data on phylogeny using ggtree.
## Molecular Biology and Evolution. 2018, 35(12):3041-3043.
## doi:10.1093/molbev/msy194
##
## S Xu, Z Dai, P Guo, X Fu, S Liu, L Zhou, W Tang, T Feng, M Chen, L
## Zhan, T Wu, E Hu, Y Jiang, X Bo, G Yu. ggtreeExtra: Compact
## visualization of richly annotated phylogenetic data. Molecular Biology
## and Evolution. 2021, 38(9):4039-4042. doi: 10.1093/molbev/msab166
##
## Attaching package: 'ggtree'
##
## The following object is masked from 'package:tidyr':
##
## expand
library(TDbook)
library(ggimage)
library(rphylopic)
## You are using rphylopic v.1.4.0. Please remember to credit PhyloPic contributors (hint: `get_attribution()`) and cite rphylopic in your work (hint: `citation("rphylopic")`).
##
## Attaching package: 'rphylopic'
##
## The following object is masked from 'package:ggimage':
##
## geom_phylopic
library(treeio)
## treeio v1.26.0 For help: https://yulab-smu.top/treedata-book/
##
## If you use the ggtree package suite in published research, please cite
## the appropriate paper(s):
##
## LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR
## Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu. treeio: an R package
## for phylogenetic tree input and output with richly annotated and
## associated data. Molecular Biology and Evolution. 2020, 37(2):599-603.
## doi: 10.1093/molbev/msz240
##
## Guangchuang Yu. Using ggtree to visualize data on tree-like structures.
## Current Protocols in Bioinformatics. 2020, 69:e96. doi:10.1002/cpbi.96
##
## Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam.
## ggtree: an R package for visualization and annotation of phylogenetic
## trees with their covariates and other associated data. Methods in
## Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628
library(tidytree)
## If you use the ggtree package suite in published research, please cite
## the appropriate paper(s):
##
## Guangchuang Yu. Using ggtree to visualize data on tree-like structures.
## Current Protocols in Bioinformatics. 2020, 69:e96. doi:10.1002/cpbi.96
##
## Guangchuang Yu. Data Integration, Manipulation and Visualization of
## Phylogenetic Trees (1st edition). Chapman and Hall/CRC. 2022,
## doi:10.1201/9781003279242
##
## Attaching package: 'tidytree'
##
## The following object is masked from 'package:treeio':
##
## getNodeNum
##
## The following object is masked from 'package:stats':
##
## filter
library(ape)
##
## Attaching package: 'ape'
##
## The following objects are masked from 'package:tidytree':
##
## drop.tip, keep.tip
##
## The following object is masked from 'package:treeio':
##
## drop.tip
##
## The following object is masked from 'package:ggtree':
##
## rotate
##
## The following object is masked from 'package:dplyr':
##
## where
library(TreeTools)
##
## Attaching package: 'TreeTools'
##
## The following object is masked from 'package:tidytree':
##
## MRCA
##
## The following object is masked from 'package:treeio':
##
## MRCA
##
## The following object is masked from 'package:ggtree':
##
## MRCA
library(phytools)
## Loading required package: maps
##
## Attaching package: 'maps'
##
## The following object is masked from 'package:purrr':
##
## map
##
##
## Attaching package: 'phytools'
##
## The following object is masked from 'package:TreeTools':
##
## as.multiPhylo
##
## The following object is masked from 'package:treeio':
##
## read.newick
library(ggnewscale)
library(ggtreeExtra)
## ggtreeExtra v1.12.0 For help: https://yulab-smu.top/treedata-book/
##
## If you use the ggtree package suite in published research, please cite
## the appropriate paper(s):
##
## S Xu, Z Dai, P Guo, X Fu, S Liu, L Zhou, W Tang, T Feng, M Chen, L
## Zhan, T Wu, E Hu, Y Jiang, X Bo, G Yu. ggtreeExtra: Compact
## visualization of richly annotated phylogenetic data. Molecular Biology
## and Evolution. 2021, 38(9):4039-4042. doi: 10.1093/molbev/msab166
library(ggstar)
library(data.table)
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
##
## The following objects are masked from 'package:dplyr':
##
## between, first, last
##
## The following object is masked from 'package:purrr':
##
## transpose
NEON_MAGs <- read_csv("data/NEON/GOLD_Study_ID_Gs0161344_NEON.csv")
## Rows: 1754 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): Bin ID, Genome Name, Bin Quality, Bin Lineage, GTDB-Tk Taxonomy L...
## dbl (10): IMG Genome ID, Bin Completeness, Bin Contamination, Total Number ...
## date (1): Date Added
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NEON_MAGs)
## # A tibble: 6 × 19
## `Bin ID` `Genome Name` `IMG Genome ID` `Bin Quality` `Bin Lineage`
## <chr> <chr> <dbl> <chr> <chr>
## 1 3300060643_14 Terrestrial soil mi… 3300060643 MQ <NA>
## 2 3300060643_16 Terrestrial soil mi… 3300060643 MQ Bacteria
## 3 3300060643_18 Terrestrial soil mi… 3300060643 MQ Bacteria; Ac…
## 4 3300060643_2 Terrestrial soil mi… 3300060643 MQ Bacteria; Ac…
## 5 3300060643_28 Terrestrial soil mi… 3300060643 MQ Bacteria; Ps…
## 6 3300060643_35 Terrestrial soil mi… 3300060643 MQ Bacteria; Ac…
## # ℹ 14 more variables: `GTDB-Tk Taxonomy Lineage` <chr>, `Bin Methods` <chr>,
## # `Created By` <chr>, `Date Added` <date>, `Bin Completeness` <dbl>,
## # `Bin Contamination` <dbl>, `Total Number of Bases` <dbl>, `5s rRNA` <dbl>,
## # `16s rRNA` <dbl>, `23s rRNA` <dbl>, `tRNA Genes` <dbl>, `Gene Count` <dbl>,
## # `Scaffold Count` <dbl>, `GOLD Study ID` <chr>
str(NEON_MAGs)
## spc_tbl_ [1,754 × 19] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Bin ID : chr [1:1754] "3300060643_14" "3300060643_16" "3300060643_18" "3300060643_2" ...
## $ Genome Name : chr [1:1754] "Terrestrial soil microbial communities from National Grasslands LBJ, Texas, USA - CLBJ_001-M-20210506-comp-1" "Terrestrial soil microbial communities from National Grasslands LBJ, Texas, USA - CLBJ_001-M-20210506-comp-1" "Terrestrial soil microbial communities from National Grasslands LBJ, Texas, USA - CLBJ_001-M-20210506-comp-1" "Terrestrial soil microbial communities from National Grasslands LBJ, Texas, USA - CLBJ_001-M-20210506-comp-1" ...
## $ IMG Genome ID : num [1:1754] 3.3e+09 3.3e+09 3.3e+09 3.3e+09 3.3e+09 ...
## $ Bin Quality : chr [1:1754] "MQ" "MQ" "MQ" "MQ" ...
## $ Bin Lineage : chr [1:1754] NA "Bacteria" "Bacteria; Actinomycetota; Actinomycetes" "Bacteria; Actinomycetota; Actinomycetes" ...
## $ GTDB-Tk Taxonomy Lineage: chr [1:1754] "Bacteria; Acidobacteriota; Blastocatellia; Pyrinomonadales; Pyrinomonadaceae; PSRF01" "Bacteria; Acidobacteriota; Vicinamibacteria; Vicinamibacterales; UBA2999; Gp6-AA45" "Bacteria; Actinobacteriota; Actinomycetia; Streptosporangiales; Streptosporangiaceae; Chersky-822" "Bacteria; Actinobacteriota; Actinomycetia; Mycobacteriales; Jatrophihabitantaceae; JAFAWL01" ...
## $ Bin Methods : chr [1:1754] "MetaBAT v2:2.15, CheckM v1.2.1, GTDB-tk v2.1.1, GTDB database release R207_v2" "MetaBAT v2:2.15, CheckM v1.2.1, GTDB-tk v2.1.1, GTDB database release R207_v2" "MetaBAT v2:2.15, CheckM v1.2.1, GTDB-tk v2.1.1, GTDB database release R207_v2" "MetaBAT v2:2.15, CheckM v1.2.1, GTDB-tk v2.1.1, GTDB database release R207_v2" ...
## $ Created By : chr [1:1754] "IMG_PIPELINE" "IMG_PIPELINE" "IMG_PIPELINE" "IMG_PIPELINE" ...
## $ Date Added : Date[1:1754], format: "2023-04-06" "2023-04-06" ...
## $ Bin Completeness : num [1:1754] 96.2 77.5 77.2 58.4 68.7 ...
## $ Bin Contamination : num [1:1754] 2.56 5.3 1.99 3.74 4.67 0 2.97 3.16 1.71 5.17 ...
## $ Total Number of Bases : num [1:1754] 6247032 5394623 4389455 3228217 3245901 ...
## $ 5s rRNA : num [1:1754] 0 0 0 0 0 1 3 0 1 0 ...
## $ 16s rRNA : num [1:1754] 1 0 0 0 0 0 1 1 0 0 ...
## $ 23s rRNA : num [1:1754] 0 0 0 0 0 1 1 0 1 0 ...
## $ tRNA Genes : num [1:1754] 54 32 35 29 12 26 24 37 47 34 ...
## $ Gene Count : num [1:1754] 5373 5406 4705 3762 3446 ...
## $ Scaffold Count : num [1:1754] 39 878 607 592 474 386 270 547 10 186 ...
## $ GOLD Study ID : chr [1:1754] "Gs0161344" "Gs0161344" "Gs0161344" "Gs0161344" ...
## - attr(*, "spec")=
## .. cols(
## .. `Bin ID` = col_character(),
## .. `Genome Name` = col_character(),
## .. `IMG Genome ID` = col_double(),
## .. `Bin Quality` = col_character(),
## .. `Bin Lineage` = col_character(),
## .. `GTDB-Tk Taxonomy Lineage` = col_character(),
## .. `Bin Methods` = col_character(),
## .. `Created By` = col_character(),
## .. `Date Added` = col_date(format = ""),
## .. `Bin Completeness` = col_double(),
## .. `Bin Contamination` = col_double(),
## .. `Total Number of Bases` = col_double(),
## .. `5s rRNA` = col_double(),
## .. `16s rRNA` = col_double(),
## .. `23s rRNA` = col_double(),
## .. `tRNA Genes` = col_double(),
## .. `Gene Count` = col_double(),
## .. `Scaffold Count` = col_double(),
## .. `GOLD Study ID` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
NEON_MAGs_Ind <- NEON_MAGs %>%
filter(`Genome Name` != "NEON combined assembly")
NEON_MAGs_Ind_tax <- NEON_MAGs_Ind %>%
separate(`GTDB-Tk Taxonomy Lineage`, c("Domain", "Phylum", "Class", "Order", "Family", "Genus"), "; ", remove = FALSE)
## Warning: Expected 6 pieces. Additional pieces discarded in 21 rows [12, 32, 66, 79, 80,
## 88, 96, 102, 104, 240, 334, 386, 657, 790, 846, 931, 943, 983, 1041, 1095,
## ...].
## Warning: Expected 6 pieces. Missing pieces filled with `NA` in 282 rows [6, 7, 42, 49,
## 50, 55, 60, 83, 85, 97, 100, 105, 107, 113, 114, 116, 119, 125, 129, 130, ...].
All Phyla Counts
kable(
NEON_MAGs_Ind_tax %>%
count(Phylum, sort = TRUE)
)
| Phylum | n |
|---|---|
| Actinobacteriota | 418 |
| Proteobacteria | 248 |
| Acidobacteriota | 181 |
| Verrucomicrobiota | 57 |
| NA | 38 |
| Chloroflexota | 35 |
| Myxococcota | 29 |
| Bacteroidota | 22 |
| Gemmatimonadota | 16 |
| Methylomirabilota | 16 |
| Planctomycetota | 16 |
| Dormibacterota | 11 |
| Eremiobacterota | 11 |
| Desulfobacterota_B | 9 |
| Desulfobacterota | 5 |
| Patescibacteria | 5 |
| Tectomicrobia | 3 |
| Cyanobacteria | 2 |
| Myxococcota_A | 2 |
| Armatimonadota | 1 |
| Chlamydiota | 1 |
| Eisenbacteria | 1 |
| Firmicutes | 1 |
| Krumholzibacteriota | 1 |
| Nitrospirota | 1 |
NEON_MAGs <- read_csv("data/NEON/GOLD_Study_ID_Gs0161344_NEON.csv") %>%
# remove columns that are not needed for data analysis
select(-c(`GOLD Study ID`, `Bin Methods`, `Created By`, `Date Added`)) %>%
# create a new column with the Assembly Type
mutate("Assembly Type" = case_when(`Genome Name` == "NEON combined assembly" ~ `Genome Name`,
TRUE ~ "Individual")) %>%
mutate_at("Assembly Type", str_replace, "NEON combined assembly", "Combined") %>%
separate(`GTDB-Tk Taxonomy Lineage`, c("Domain", "Phylum", "Class", "Order", "Family", "Genus"), "; ", remove = FALSE) %>%
# Get rid of the the common string "Soil microbial communities from "
mutate_at("Genome Name", str_replace, "Terrestrial soil microbial communities from ", "") %>%
# Use the first `-` to split the column in two
separate(`Genome Name`, c("Site","Sample Name"), " - ") %>%
# Get rid of the the common string "S-comp-1"
mutate_at("Sample Name", str_replace, "-comp-1", "") %>%
# separate the Sample Name into Site ID and plot info
separate(`Sample Name`, c("Site ID","subplot.layer.date"), "_", remove = FALSE,) %>%
# separate the plot info into 3 columns
separate(`subplot.layer.date`, c("Subplot", "Layer", "Date"), "-")
## Rows: 1754 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): Bin ID, Genome Name, Bin Quality, Bin Lineage, GTDB-Tk Taxonomy L...
## dbl (10): IMG Genome ID, Bin Completeness, Bin Contamination, Total Number ...
## date (1): Date Added
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Warning: Expected 6 pieces. Additional pieces discarded in 29 rows [12, 32, 66, 79, 80,
## 88, 96, 102, 104, 240, 334, 386, 657, 790, 846, 931, 943, 983, 1041, 1095,
## ...].
## Warning: Expected 6 pieces. Missing pieces filled with `NA` in 429 rows [6, 7, 42, 49,
## 50, 55, 60, 83, 85, 97, 100, 105, 107, 113, 114, 116, 119, 125, 129, 130, ...].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 624 rows [1131, 1132,
## 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145,
## 1146, 1147, 1148, 1149, 1150, ...].
NEON_MAGs_bact_ind <- NEON_MAGs %>%
filter(Domain == "Bacteria") %>%
filter(`Assembly Type` == "Individual")
Phyla with Dormibacterota Filtered
kable(
NEON_MAGs_Ind_tax %>%
count(Phylum,sort('Dormibacterota')))
| Phylum | sort(“Dormibacterota”) | n |
|---|---|---|
| Acidobacteriota | Dormibacterota | 181 |
| Actinobacteriota | Dormibacterota | 418 |
| Armatimonadota | Dormibacterota | 1 |
| Bacteroidota | Dormibacterota | 22 |
| Chlamydiota | Dormibacterota | 1 |
| Chloroflexota | Dormibacterota | 35 |
| Cyanobacteria | Dormibacterota | 2 |
| Desulfobacterota | Dormibacterota | 5 |
| Desulfobacterota_B | Dormibacterota | 9 |
| Dormibacterota | Dormibacterota | 11 |
| Eisenbacteria | Dormibacterota | 1 |
| Eremiobacterota | Dormibacterota | 11 |
| Firmicutes | Dormibacterota | 1 |
| Gemmatimonadota | Dormibacterota | 16 |
| Krumholzibacteriota | Dormibacterota | 1 |
| Methylomirabilota | Dormibacterota | 16 |
| Myxococcota | Dormibacterota | 29 |
| Myxococcota_A | Dormibacterota | 2 |
| Nitrospirota | Dormibacterota | 1 |
| Patescibacteria | Dormibacterota | 5 |
| Planctomycetota | Dormibacterota | 16 |
| Proteobacteria | Dormibacterota | 248 |
| Tectomicrobia | Dormibacterota | 3 |
| Verrucomicrobiota | Dormibacterota | 57 |
| NA | Dormibacterota | 38 |
NEON_MAGs_bact_ind %>%
ggplot(aes(x = Phylum)) +
geom_bar() +
coord_flip() +
labs(title = "Phyla Counts Across All Sites")
NEON_MAGs_bact_ind %>%
ggplot(aes(x = fct_rev(fct_infreq(Phylum)), fill = Site)) +
geom_bar() +
coord_flip() +
labs(title = "Phyla Counts Labeled by Site")
NEON_MAGs_bact_ind %>%
ggplot(aes(x = Phylum)) +
geom_bar(position = position_dodge2(width = 0.9, preserve = "single")) +
coord_flip() +
facet_wrap(vars(Site), scales = "free", ncol = 2) +
labs(title = "Phyla Counts Separated Out by Site")
NEON_MAGs_bact_ind %>%
ggplot(aes(x = fct_infreq(Phylum), y = `Total Number of Bases`)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle=45, vjust=1, hjust=1)) +
labs(title = "Total Number of Nucleotide Bases for each Major Phylum")
NEON_MAGs_bact_ind %>%
ggplot(aes(x = Subplot, color = `Site ID`, fill = `Site ID`)) +
geom_bar() +
coord_flip() +
labs (title = "Subplot Count Colored by Site ID")
NEON_MAGs_bact_ind %>%
ggplot(aes(x = Site, fill = Phylum)) +
geom_bar() +
coord_flip() +
labs(title = "Phyla Counts at Various Sites, Colored by Phylum")
NEON_MAGs_bact_ind %>%
ggplot(aes(x = `Total Number of Bases`, y = `Gene Count`, color = Phylum)) +
geom_point() +
coord_flip() +
labs(title = "Gene Count vs Total Number of Bases At All Sites, Colored by Phylum")
NEON_MAGs_GSF <- NEON_MAGs %>%
filter(str_detect(`Site`, "Guanica State Forest and Biosphere Reserve, Puerto Rico"))
NEON_MAGs_D <- NEON_MAGs %>%
filter(str_detect(`GTDB-Tk Taxonomy Lineage`, "Dormibacterota"))
NEON_metagenomes <- read_tsv("data/NEON/exported_img_data_Gs0161344_NEON.tsv") %>%
rename(`Genome Name` = `Genome Name / Sample Name`) %>%
filter(str_detect(`Genome Name`, 're-annotation', negate = T)) %>%
filter(str_detect(`Genome Name`, 'WREF plot', negate = T))
## Rows: 176 Columns: 46
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (18): Domain, Sequencing Status, Study Name, Genome Name / Sample Name, ...
## dbl (16): taxon_oid, IMG Genome ID, Depth In Meters, Elevation In Meters, Ge...
## lgl (12): Altitude In Meters, Chlorophyll Concentration, Longhurst Code, Lon...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
NEON_metagenomes <- NEON_metagenomes %>%
# Get rid of the the common string "Soil microbial communities from "
mutate_at("Genome Name", str_replace, "Terrestrial soil microbial communities from ", "") %>%
# Use the first `-` to split the column in two
separate(`Genome Name`, c("Site","Sample Name"), " - ") %>%
# Get rid of the the common string "-comp-1"
mutate_at("Sample Name", str_replace, "-comp-1", "") %>%
# separate the Sample Name into Site ID and plot info
separate(`Sample Name`, c("Site ID","subplot.layer.date"), "_", remove = FALSE,) %>%
# separate the plot info into 3 columns
separate(`subplot.layer.date`, c("Subplot", "Layer", "Date"), "-")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [53].
NEON_chemistry <- read_tsv("data/NEON/neon_plot_soilChem1_metadata.tsv") %>%
# remove -COMP from genomicsSampleID
mutate_at("genomicsSampleID", str_replace, "-COMP", "")
## Rows: 87 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (5): genomicsSampleID, siteID, plotID, nlcdClass, horizon
## dbl (11): decimalLatitude, decimalLongitude, elevation, soilTemp, d15N, org...
## date (1): collectionDate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
NEON_FULL <- NEON_MAGs %>%
left_join(NEON_metagenomes, by = c("Sample Name")) %>%
left_join(NEON_chemistry, by = c("Sample Name" = "genomicsSampleID"))
NEON_FULL_D <- NEON_FULL %>%
filter(str_detect(`Phylum`,"Dormibacterota" ))
NEON_FULL_D %>%
ggplot(aes(x = `Site.x`, y = `soilInWaterpH`)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle=50, vjust=1, hjust=1)) +
labs(title = "Soil Water pH Across Sites, Specific to Dormibacterota")
## Warning: Removed 11 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
NEON_FULL_D %>%
ggplot(aes(x = `Bin Contamination`)) +
geom_bar() +
labs(title = "Dormibacterota Bin Contamination Counts")
ggplot(data = NEON_FULL_D, aes(x = `Ecosystem Subtype`, y = `soilTemp`)) +
geom_point(aes(color = Order)) +
labs(title = "Ecosystem Subtype vs Temperture Colored by Order")
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_point()`).
NEON_MAGs_metagenomes_chemistry <- NEON_MAGs %>%
left_join(NEON_metagenomes, by = "Sample Name") %>%
left_join(NEON_chemistry, by = c("Sample Name" = "genomicsSampleID")) %>%
rename("label" = "Bin ID")
tree_bac <- read.tree("data/NEON/gtdbtk.bac120.decorated.tree")
tree_bac_preorder <- Preorder(tree_bac)
tree_Dormibacterota <- Subtree(tree_bac_preorder, 1767)
ggtree(tree_Dormibacterota) %<+%
NEON_MAGs_metagenomes_chemistry +
geom_tiplab(size=2, hjust=-.1) +
xlim(0,20) +
geom_point(mapping=aes(color=`Ecosystem Subtype`)) +
labs(title = "Dormibacterota Ecosystem Subtype Displayed Using Phylogenetic Tree")
tree_arc <- read.tree("data/NEON/gtdbtk.ar53.decorated.tree")
tree_bac <- read.tree("data/NEON/gtdbtk.bac120.decorated.tree")
node_vector_bac = c(tree_bac$tip.label,tree_bac$node.label)
grep("Dormibacterota", node_vector_bac, value = TRUE)
## [1] "'1.0:p__Dormibacterota; c__Dormibacteria'"
match(grep("Dormibacterota", node_vector_bac, value = TRUE), node_vector_bac)
## [1] 1767
NEON_MAGs_metagenomes_chemistry <- NEON_MAGs %>%
left_join(NEON_metagenomes, by = "Sample Name") %>%
left_join(NEON_chemistry, by = c("Sample Name" = "genomicsSampleID")) %>%
rename("label" = "Bin ID")
tree_bac_preorder <- Preorder(tree_bac)
tree_Dormibacterota <- Subtree(tree_bac_preorder, 1767)
NEON_MAGs_Dormibacterota <- NEON_MAGs_metagenomes_chemistry %>%
filter(Phylum == "Dormibacterota")
ggtree(tree_bac, layout="circular", branch.length="none") +
geom_hilight(node=1767, fill="steelblue", alpha=.6) +
geom_cladelab(node=1767, label="Dormibacterota", align=TRUE, offset = 0, textcolor='steelblue', barcolor='steelblue') +
geom_hilight(node=1789, fill="darkgreen", alpha=.6) +
geom_cladelab(node=1789, label="Actinomycetota", align=TRUE, vjust=-0.4, offset = 0, textcolor='darkgreen', barcolor='darkgreen') +
geom_hilight(node=2673, fill="darkorange", alpha=.6) +
geom_cladelab(node=2673, label="Acidobacteriota", align=TRUE, hjust=1.1, offset = 0, textcolor='darkorange', barcolor='darkorange') +
labs(title = "Circular Phylogenetic Tree Showing Dormibacterota in Relation to Actinomycetota and Acidobacteriota")
NEON_MAGs_metagenomes_chemistry_noblank <- NEON_MAGs_metagenomes_chemistry %>%
rename("AssemblyType" = "Assembly Type") %>%
rename("BinCompleteness" = "Bin Completeness") %>%
rename("BinContamination" = "Bin Contamination") %>%
rename("TotalNumberofBases" = "Total Number of Bases") %>%
rename("EcosystemSubtype" = "Ecosystem Subtype")
ggtree(tree_Dormibacterota) %<+%
NEON_MAGs_metagenomes_chemistry +
geom_tippoint(aes(colour=`Ecosystem Subtype`)) +
# For unknown reasons the following does not like blank spaces in the names
geom_facet(panel = "Bin Completeness", data = NEON_MAGs_metagenomes_chemistry_noblank, geom = geom_point,
mapping=aes(x = BinCompleteness)) +
geom_facet(panel = "Bin Contamination", data = NEON_MAGs_metagenomes_chemistry_noblank, geom = geom_col,
aes(x = BinContamination), orientation = 'y', width = .6) +
theme_tree2(legend.position=c(.1, .7)) +
labs(title = "Phylogenetic Tree Displaying Ecosystem Subtypes, Bin Completeness Counts, and Bin Contamination Counts")
ggtree(tree_Dormibacterota, layout="circular") %<+%
NEON_MAGs_metagenomes_chemistry +
geom_point2(mapping=aes(color=`Ecosystem Subtype`, size=`Total Number of Bases`)) +
labs(title = "Circular Dormibacterota Phylogenetic Tree Displaying Total Number of Bases and Ecosystem Subtype")
## Warning: Removed 21 rows containing missing values or values outside the scale range
## (`geom_point_g_gtree()`).
NEON_MAGs_Dormibacterota %>%
ggplot(aes(x=`Ecosystem Subtype`))+
geom_bar()+
coord_flip() +
labs(title = "Ecosystem Subtypes where Dormibacterota are Found")
kable(
NEON_metagenomes_GUAN <- NEON_metagenomes %>%
select(c(`Sample Name`, `Site ID`, `Ecosystem Subtype`))
)
| Sample Name | Site ID | Ecosystem Subtype |
|---|---|---|
| CLBJ_006-M-20210506 | CLBJ | Grasslands |
| CLBJ_002-M-20210506 | CLBJ | Grasslands |
| WOOD_004-M-20210714 | WOOD | Wetlands |
| TOOL_002-O-20210804 | TOOL | Tundra |
| WREF_004-M-20210622 | WREF | Temperate forest |
| TEAK_004-O-20210726 | TEAK | Temperate forest |
| HEAL_048-M-20210622 | HEAL | Boreal forest/Taiga |
| KONZ_043-M-20210721 | KONZ | Grasslands |
| YELL_048-M-20210707 | YELL | Temperate forest |
| TOOL_041-O-20210803 | TOOL | Tundra |
| TOOL_003-O-20210805 | TOOL | Tundra |
| GUAN_048-M-20210920 | GUAN | Tropical forest |
| WREF_004-O-20210622 | WREF | Temperate forest |
| SRER_043-M-20210809 | SRER | Desert |
| SRER_006-M-20210809 | SRER | Desert |
| NIWO_004-M-20210726 | NIWO | Temperate forest |
| TEAK_004-M-20210726 | TEAK | Temperate forest |
| NIWO_002-M-20210728 | NIWO | Temperate forest |
| CLBJ_040-M-20210503 | CLBJ | Grasslands |
| ONAQ_004-M-20210525 | ONAQ | Shrubland |
| SRER_004-M-20210809 | SRER | Desert |
| WREF_073-M-20210623 | WREF | Temperate forest |
| YELL_005-M-20210708 | YELL | Temperate forest |
| WREF_001-O-20210621 | WREF | Temperate forest |
| WOOD_005-M-20210708 | WOOD | Wetlands |
| GUAN_042-M-20210920 | GUAN | Tropical forest |
| ONAQ_010-M-20210526 | ONAQ | Shrubland |
| TOOL_005-O-20210806 | TOOL | Tundra |
| TOOL_042-O-20210803 | TOOL | Tundra |
| TOOL_006-O-20210804 | TOOL | Tundra |
| HEAL_048-O-20210622 | HEAL | Boreal forest/Taiga |
| YELL_002-M-20210706 | YELL | Temperate forest |
| TEAK_025-M-20210726 | TEAK | Temperate forest |
| KONZ_024-M-20210719 | KONZ | Grasslands |
| WOOD_043-M-20210712 | WOOD | Wetlands |
| CLBJ_032-M-20210504 | CLBJ | Grasslands |
| YELL_012-O-20210708 | YELL | Temperate forest |
| BONA_009-O-20210707 | BONA | Boreal forest/Taiga |
| TEAK_043-M-20210719 | TEAK | Temperate forest |
| SRER_053-M-20210810 | SRER | Desert |
| CLBJ_001-M-20210506 | CLBJ | Grasslands |
| ONAQ_008-M-20210524 | ONAQ | Shrubland |
| GUAN_006-M-20210922 | GUAN | Tropical forest |
| CLBJ_033-M-20210505 | CLBJ | Grasslands |
| TEAK_002-O-20210720 | TEAK | Temperate forest |
| BONA_004-O-20210707 | BONA | Boreal forest/Taiga |
| WOOD_042-M-20210712 | WOOD | Wetlands |
| YELL_016-M-20210708 | YELL | Temperate forest |
| KONZ_045-M-20210721 | KONZ | Grasslands |
| ONAQ_002-M-20210524 | ONAQ | Shrubland |
| GUAN_003-M-20210922 | GUAN | Tropical forest |
| SRER_047-M-20210809 | SRER | Desert |
| NA | NA | Shrubland |
| TEAK_005-M-20210728 | TEAK | Temperate forest |
| TEAK_005-O-20210728 | TEAK | Temperate forest |
| NIWO_001-O-20210728 | NIWO | Temperate forest |
| CLBJ_038-M-20210504 | CLBJ | Grasslands |
| GUAN_004-M-20210922 | GUAN | Tropical forest |
| ONAQ_005-M-20210527 | ONAQ | Shrubland |
| YELL_009-M-20210706 | YELL | Temperate forest |
| NIWO_004-O-20210726 | NIWO | Temperate forest |
| WREF_073-O-20210623 | WREF | Temperate forest |
| ONAQ_003-M-20210527 | ONAQ | Shrubland |
| WREF_003-O-20210622 | WREF | Temperate forest |
| TOOL_044-O-20210803 | TOOL | Tundra |
| BONA_001-O-20210708 | BONA | Boreal forest/Taiga |
| YELL_003-M-20210708 | YELL | Temperate forest |
| CLBJ_003-M-20210506 | CLBJ | Grasslands |
| WOOD_001-M-20210714 | WOOD | Wetlands |
| WOOD_002-M-20210708 | WOOD | Wetlands |
| TEAK_003-M-20210726 | TEAK | Temperate forest |
| SRER_005-M-20210810 | SRER | Desert |
| SRER_052-M-20210810 | SRER | Desert |
| KONZ_046-M-20210720 | KONZ | Grasslands |
| NIWO_003-M-20210727 | NIWO | Temperate forest |
| YELL_046-M-20210705 | YELL | Temperate forest |
| BONA_006-O-20210707 | BONA | Boreal forest/Taiga |
| GUAN_043-M-20210921 | GUAN | Tropical forest |
| WOOD_024-O-20210714 | WOOD | Wetlands |
| NIWO_005-M-20210726 | NIWO | Temperate forest |
| TOOL_004-O-20210805 | TOOL | Tundra |
| WREF_003-M-20210622 | WREF | Temperate forest |
| TOOL_043-O-20210803 | TOOL | Tundra |
| WOOD_024-M-20210714 | WOOD | Wetlands |
| YELL_051-M-20210705 | YELL | Temperate forest |
| KONZ_042-M-20210720 | KONZ | Grasslands |
| GUAN_007-M-20210922 | GUAN | Tropical forest |
| WOOD_003-M-20210708 | WOOD | Wetlands |
ggplot(data = NEON_metagenomes_GUAN, aes(x = `Site ID`, y = `Ecosystem Subtype`)) +
geom_point() +
labs(title = "Ecosystem Subtype at each Site ID, Guanica State Forest = GUAN")
ggplot(NEON_MAGs_GSF)+geom_bar(mapping=aes(y=`GTDB-Tk Taxonomy Lineage`))+
labs(title = "Count of each Taxonomy Lineage at Guanica State Forest")
NEON_MAGs_GSF %>%
ggplot(aes(x=`Bin Lineage`))+
geom_bar()+
coord_flip() +
labs(title = "Bin Lineage Counts at Guanica State Forest")
kable(
NEON_MAGs %>%
filter(str_detect(`Site`, "Guanica State Forest and Biosphere Reserve, Puerto Rico"))
)
| Bin ID | Site | Sample Name | Site ID | Subplot | Layer | Date | IMG Genome ID | Bin Quality | Bin Lineage | GTDB-Tk Taxonomy Lineage | Domain | Phylum | Class | Order | Family | Genus | Bin Completeness | Bin Contamination | Total Number of Bases | 5s rRNA | 16s rRNA | 23s rRNA | tRNA Genes | Gene Count | Scaffold Count | Assembly Type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3300060854_24 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_007-M-20210922 | GUAN | 007 | M | 20210922 | 3300060854 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae | NA | NA | NA | NA | NA | NA | NA | 56.15 | 0.97 | 1533081 | 1 | 0 | 0 | 25 | 1741 | 146 | Individual |
| 3300060854_34 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_007-M-20210922 | GUAN | 007 | M | 20210922 | 3300060854 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae | NA | NA | NA | NA | NA | NA | NA | 53.25 | 1.46 | 1450069 | 0 | 0 | 0 | 30 | 1647 | 219 | Individual |
| 3300060854_37 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_007-M-20210922 | GUAN | 007 | M | 20210922 | 3300060854 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae; Nitrososphaera; Candidatus Nitrososphaera evergladensis | NA | NA | NA | NA | NA | NA | NA | 53.24 | 0.00 | 789887 | 0 | 1 | 2 | 13 | 1037 | 51 | Individual |
| 3300060854_44 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_007-M-20210922 | GUAN | 007 | M | 20210922 | 3300060854 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; Thermoleophilia; Gaiellales; Gaiellaceae; JACDAN01 | Bacteria | Actinobacteriota | Thermoleophilia | Gaiellales | Gaiellaceae | JACDAN01 | 56.82 | 9.48 | 2200962 | 0 | 0 | 0 | 35 | 2683 | 396 | Individual |
| 3300060854_5 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_007-M-20210922 | GUAN | 007 | M | 20210922 | 3300060854 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Acidimicrobiia; IMCC26256; PALSA-555 | Bacteria | Actinobacteriota | Acidimicrobiia | IMCC26256 | PALSA-555 | NA | 52.76 | 4.27 | 1397348 | 0 | 0 | 0 | 21 | 1718 | 270 | Individual |
| 3300060854_6 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_007-M-20210922 | GUAN | 007 | M | 20210922 | 3300060854 | MQ | Bacteria; Actinomycetota; Thermoleophilia | Bacteria; Actinobacteriota; Thermoleophilia; Solirubrobacterales; 70-9; VGBV01 | Bacteria | Actinobacteriota | Thermoleophilia | Solirubrobacterales | 70-9 | VGBV01 | 71.49 | 8.33 | 2400015 | 0 | 1 | 0 | 35 | 2858 | 362 | Individual |
| 3300060887_16 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_042-M-20210920 | GUAN | 042 | M | 20210920 | 3300060887 | MQ | Bacteria; Pseudomonadota; Alphaproteobacteria; Hyphomicrobiales | Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Hyphomicrobiaceae; AWTP1-13 | Bacteria | Proteobacteria | Alphaproteobacteria | Rhizobiales | Hyphomicrobiaceae | AWTP1-13 | 54.50 | 1.88 | 3392335 | 0 | 0 | 0 | 18 | 3738 | 578 | Individual |
| 3300060887_21 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_042-M-20210920 | GUAN | 042 | M | 20210920 | 3300060887 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Acidimicrobiia; IMCC26256; PALSA-555 | Bacteria | Actinobacteriota | Acidimicrobiia | IMCC26256 | PALSA-555 | NA | 67.66 | 4.16 | 2450677 | 0 | 0 | 0 | 39 | 2930 | 396 | Individual |
| 3300060887_26 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_042-M-20210920 | GUAN | 042 | M | 20210920 | 3300060887 | MQ | Bacteria; Actinomycetota; Thermoleophilia | Bacteria; Actinobacteriota; Thermoleophilia; Solirubrobacterales; 70-9 | Bacteria | Actinobacteriota | Thermoleophilia | Solirubrobacterales | 70-9 | NA | 79.35 | 9.48 | 2379854 | 1 | 0 | 1 | 31 | 2692 | 259 | Individual |
| 3300060887_27 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_042-M-20210920 | GUAN | 042 | M | 20210920 | 3300060887 | MQ | Bacteria | Bacteria; Desulfobacterota_B; Binatia; UBA9968; UBA9968; DP-20 | Bacteria | Desulfobacterota_B | Binatia | UBA9968 | UBA9968 | DP-20 | 65.01 | 0.65 | 3097026 | 0 | 0 | 0 | 18 | 3210 | 404 | Individual |
| 3300060887_32 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_042-M-20210920 | GUAN | 042 | M | 20210920 | 3300060887 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae | NA | NA | NA | NA | NA | NA | NA | 63.85 | 0.00 | 1469352 | 1 | 0 | 1 | 17 | 1670 | 242 | Individual |
| 3300060887_39 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_042-M-20210920 | GUAN | 042 | M | 20210920 | 3300060887 | MQ | Archaea | NA | NA | NA | NA | NA | NA | NA | 89.81 | 2.43 | 4009591 | 1 | 0 | 0 | 41 | 4918 | 399 | Individual |
| 3300060887_40 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_042-M-20210920 | GUAN | 042 | M | 20210920 | 3300060887 | MQ | Bacteria | Bacteria; Acidobacteriota; Blastocatellia; RBC074; RBC074; JADJLO01 | Bacteria | Acidobacteriota | Blastocatellia | RBC074 | RBC074 | JADJLO01 | 59.60 | 7.86 | 2998531 | 0 | 0 | 0 | 22 | 2763 | 312 | Individual |
| 3300060887_5 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_042-M-20210920 | GUAN | 042 | M | 20210920 | 3300060887 | MQ | Bacteria; Actinomycetota; Thermoleophilia | Bacteria; Actinobacteriota; Thermoleophilia; Solirubrobacterales; Thermoleophilaceae; JACVRW01 | Bacteria | Actinobacteriota | Thermoleophilia | Solirubrobacterales | Thermoleophilaceae | JACVRW01 | 57.81 | 0.00 | 2174027 | 0 | 0 | 0 | 13 | 2413 | 343 | Individual |
| 3300060888_13 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_003-M-20210922 | GUAN | 003 | M | 20210922 | 3300060888 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae; Nitrososphaera; Candidatus Nitrososphaera evergladensis | NA | NA | NA | NA | NA | NA | NA | 80.10 | 2.91 | 1834459 | 1 | 1 | 1 | 37 | 2365 | 31 | Individual |
| 3300060888_15 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_003-M-20210922 | GUAN | 003 | M | 20210922 | 3300060888 | MQ | Bacteria | Bacteria; Chloroflexota; Limnocylindria; QHBO01; QHBO01 | Bacteria | Chloroflexota | Limnocylindria | QHBO01 | QHBO01 | NA | 56.14 | 4.59 | 1799820 | 0 | 0 | 0 | 31 | 2108 | 328 | Individual |
| 3300060888_26 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_003-M-20210922 | GUAN | 003 | M | 20210922 | 3300060888 | MQ | Bacteria; Actinomycetota; Actinomycetes; Mycobacteriales; Mycobacteriaceae | Bacteria; Actinobacteriota; Actinomycetia; Mycobacteriales; Mycobacteriaceae; Mycobacterium | Bacteria | Actinobacteriota | Actinomycetia | Mycobacteriales | Mycobacteriaceae | Mycobacterium | 62.27 | 2.61 | 4614098 | 0 | 0 | 0 | 43 | 5287 | 731 | Individual |
| 3300060898_12 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_048-M-20210920 | GUAN | 048 | M | 20210920 | 3300060898 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; UBA4738; UBA4738; HRBIN12; AC-51 | Bacteria | Actinobacteriota | UBA4738 | UBA4738 | HRBIN12 | AC-51 | 72.79 | 9.83 | 1809081 | 0 | 0 | 0 | 31 | 2143 | 336 | Individual |
| 3300060898_28 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_048-M-20210920 | GUAN | 048 | M | 20210920 | 3300060898 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Acidimicrobiia; Acidimicrobiales; JACDCH01; ZC4RG19 | Bacteria | Actinobacteriota | Acidimicrobiia | Acidimicrobiales | JACDCH01 | ZC4RG19 | 50.16 | 0.00 | 2356576 | 0 | 1 | 0 | 12 | 2576 | 488 | Individual |
| 3300060898_45 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_048-M-20210920 | GUAN | 048 | M | 20210920 | 3300060898 | MQ | Bacteria; Pseudomonadota; Betaproteobacteria | Bacteria; Proteobacteria; Gammaproteobacteria; Burkholderiales; SG8-39; SCGC-AG-212-J23 | Bacteria | Proteobacteria | Gammaproteobacteria | Burkholderiales | SG8-39 | SCGC-AG-212-J23 | 80.90 | 3.35 | 3608050 | 1 | 1 | 0 | 31 | 4193 | 373 | Individual |
| 3300060898_54 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_048-M-20210920 | GUAN | 048 | M | 20210920 | 3300060898 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; UBA4738; UBA4738; HRBIN12; DSRY01 | Bacteria | Actinobacteriota | UBA4738 | UBA4738 | HRBIN12 | DSRY01 | 58.62 | 8.62 | 2193175 | 0 | 1 | 0 | 34 | 2537 | 386 | Individual |
| 3300060898_8 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_048-M-20210920 | GUAN | 048 | M | 20210920 | 3300060898 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; UBA4738; UBA4738; HRBIN12; DSRY01 | Bacteria | Actinobacteriota | UBA4738 | UBA4738 | HRBIN12 | DSRY01 | 52.35 | 6.90 | 1241702 | 1 | 0 | 1 | 21 | 1441 | 212 | Individual |
| 3300060898_9 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_048-M-20210920 | GUAN | 048 | M | 20210920 | 3300060898 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Actinomycetia; Propionibacteriales; Nocardioidaceae | Bacteria | Actinobacteriota | Actinomycetia | Propionibacteriales | Nocardioidaceae | NA | 91.29 | 2.23 | 4726679 | 0 | 1 | 0 | 62 | 5124 | 430 | Individual |
| 3300060914_13 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; Thermoleophilia; Gaiellales; Gaiellaceae; JACDAN01 | Bacteria | Actinobacteriota | Thermoleophilia | Gaiellales | Gaiellaceae | JACDAN01 | 53.03 | 0.00 | 1546924 | 1 | 0 | 1 | 25 | 1811 | 206 | Individual |
| 3300060914_14 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae | NA | NA | NA | NA | NA | NA | NA | 51.63 | 0.00 | 906705 | 1 | 0 | 0 | 16 | 1054 | 153 | Individual |
| 3300060914_17 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | HQ | NA | Bacteria; Acidobacteriota; Blastocatellia; Pyrinomonadales; Pyrinomonadaceae; JACMLC01 | Bacteria | Acidobacteriota | Blastocatellia | Pyrinomonadales | Pyrinomonadaceae | JACMLC01 | 91.45 | 4.32 | 4759799 | 1 | 1 | 1 | 43 | 4080 | 47 | Individual |
| 3300060914_23 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Actinomycetia; Jiangellales; Jiangellaceae | Bacteria | Actinobacteriota | Actinomycetia | Jiangellales | Jiangellaceae | NA | 75.84 | 2.94 | 3336565 | 1 | 1 | 1 | 43 | 3769 | 468 | Individual |
| 3300060914_25 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota; Actinomycetes; Propionibacteriales | Bacteria; Actinobacteriota; Actinomycetia; Propionibacteriales; Propionibacteriaceae | Bacteria | Actinobacteriota | Actinomycetia | Propionibacteriales | Propionibacteriaceae | NA | 57.90 | 0.00 | 2403924 | 0 | 0 | 0 | 22 | 2610 | 296 | Individual |
| 3300060914_26 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota; Rubrobacteria; Rubrobacterales; Rubrobacteraceae; Rubrobacter | Bacteria; Actinobacteriota; Rubrobacteria; Rubrobacterales; Rubrobacteraceae; SCSIO-52909 | Bacteria | Actinobacteriota | Rubrobacteria | Rubrobacterales | Rubrobacteraceae | SCSIO-52909 | 83.19 | 1.32 | 2465732 | 0 | 0 | 0 | 30 | 2772 | 268 | Individual |
| 3300060914_28 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; UBA4738; UBA4738 | Bacteria | Actinobacteriota | UBA4738 | UBA4738 | NA | NA | 58.12 | 1.71 | 1438439 | 1 | 2 | 1 | 31 | 1654 | 149 | Individual |
| 3300060914_30 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Archaea | NA | NA | NA | NA | NA | NA | NA | 96.12 | 1.94 | 3938282 | 1 | 0 | 0 | 43 | 4828 | 375 | Individual |
| 3300060914_32 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; UBA4738; CADDZG01; WHSQ01; WHSV01 | Bacteria | Actinobacteriota | UBA4738 | CADDZG01 | WHSQ01 | WHSV01 | 73.70 | 2.56 | 2469498 | 0 | 0 | 0 | 30 | 2747 | 288 | Individual |
| 3300060914_35 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria | Bacteria; Acidobacteriota; Blastocatellia; RBC074; RBC074 | Bacteria | Acidobacteriota | Blastocatellia | RBC074 | RBC074 | NA | 63.74 | 5.90 | 5269156 | 0 | 0 | 0 | 24 | 4815 | 533 | Individual |
| 3300060914_39 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Actinomycetia | Bacteria | Actinobacteriota | Actinomycetia | NA | NA | NA | 94.79 | 2.99 | 5097796 | 0 | 0 | 0 | 102 | 5141 | 413 | Individual |
| 3300060914_41 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria | Bacteria; Desulfobacterota_B; Binatia; UBA9968; UBA9968; DP-1 | Bacteria | Desulfobacterota_B | Binatia | UBA9968 | UBA9968 | DP-1 | 50.54 | 2.52 | 2583074 | 0 | 0 | 0 | 13 | 2824 | 279 | Individual |
| 3300060914_44 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria | Bacteria; Chloroflexota; UBA6077; UBA6077; CF-72 | Bacteria | Chloroflexota | UBA6077 | UBA6077 | CF-72 | NA | 72.44 | 4.17 | 6734988 | 1 | 0 | 1 | 46 | 6737 | 745 | Individual |
| 3300060914_46 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota; Rubrobacteria; Rubrobacterales; Rubrobacteraceae; Rubrobacter | Bacteria; Actinobacteriota; Rubrobacteria; Rubrobacterales; Rubrobacteraceae; SCSIO-52909 | Bacteria | Actinobacteriota | Rubrobacteria | Rubrobacterales | Rubrobacteraceae | SCSIO-52909 | 58.04 | 0.00 | 2062795 | 1 | 0 | 0 | 23 | 2335 | 236 | Individual |
| 3300060914_49 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria | Bacteria; Acidobacteriota; Vicinamibacteria; Vicinamibacterales; UBA2999 | Bacteria | Acidobacteriota | Vicinamibacteria | Vicinamibacterales | UBA2999 | NA | 66.24 | 5.13 | 3149726 | 0 | 0 | 0 | 15 | 3046 | 511 | Individual |
| 3300060914_5 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria | Bacteria; Acidobacteriota; Vicinamibacteria; Vicinamibacterales; 2-12-FULL-66-21 | Bacteria | Acidobacteriota | Vicinamibacteria | Vicinamibacterales | 2-12-FULL-66-21 | NA | 51.90 | 0.00 | 1772468 | 0 | 0 | 0 | 8 | 1858 | 350 | Individual |
| 3300060914_50 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Acidimicrobiia; IMCC26256; PALSA-555 | Bacteria | Actinobacteriota | Acidimicrobiia | IMCC26256 | PALSA-555 | NA | 61.87 | 2.59 | 1607679 | 0 | 0 | 0 | 24 | 1916 | 257 | Individual |
| 3300060914_52 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae; Nitrososphaera; Candidatus Nitrososphaera evergladensis | NA | NA | NA | NA | NA | NA | NA | 86.89 | 2.43 | 1541281 | 1 | 1 | 1 | 34 | 2026 | 57 | Individual |
| 3300060914_53 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae; Nitrososphaera; Candidatus Nitrososphaera evergladensis | NA | NA | NA | NA | NA | NA | NA | 66.83 | 2.91 | 1374179 | 1 | 1 | 1 | 39 | 1607 | 94 | Individual |
| 3300060914_55 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | MQ | Bacteria; Actinomycetota; Actinomycetes; Propionibacteriales | Bacteria; Actinobacteriota; Actinomycetia; Propionibacteriales; Nocardioidaceae | Bacteria | Actinobacteriota | Actinomycetia | Propionibacteriales | Nocardioidaceae | NA | 89.84 | 2.68 | 3777568 | 1 | 2 | 1 | 25 | 4076 | 429 | Individual |
| 3300060914_9 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_043-M-20210921 | GUAN | 043 | M | 20210921 | 3300060914 | HQ | Bacteria; Actinomycetota; Thermoleophilia; Solirubrobacterales | Bacteria; Actinobacteriota; Thermoleophilia; Solirubrobacterales; 70-9; VAYN01 | Bacteria | Actinobacteriota | Thermoleophilia | Solirubrobacterales | 70-9 | VAYN01 | 98.28 | 0.86 | 2383318 | 1 | 1 | 1 | 52 | 2505 | 18 | Individual |
| 3300061642_12 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_006-M-20210922 | GUAN | 006 | M | 20210922 | 3300061642 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; UBA4738; UBA4738; HRBIN12; DSRY01 | Bacteria | Actinobacteriota | UBA4738 | UBA4738 | HRBIN12 | DSRY01 | 67.22 | 2.14 | 1451150 | 0 | 0 | 0 | 20 | 1700 | 256 | Individual |
| 3300061642_24 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_006-M-20210922 | GUAN | 006 | M | 20210922 | 3300061642 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; UBA4738; UBA4738 | Bacteria | Actinobacteriota | UBA4738 | UBA4738 | NA | NA | 62.54 | 8.62 | 1931241 | 1 | 1 | 1 | 23 | 2194 | 272 | Individual |
| 3300061642_25 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_006-M-20210922 | GUAN | 006 | M | 20210922 | 3300061642 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Actinomycetia; Propionibacteriales; Nocardioidaceae | Bacteria | Actinobacteriota | Actinomycetia | Propionibacteriales | Nocardioidaceae | NA | 84.71 | 6.65 | 3972987 | 1 | 1 | 2 | 41 | 4345 | 516 | Individual |
| 3300061642_28 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_006-M-20210922 | GUAN | 006 | M | 20210922 | 3300061642 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae | NA | NA | NA | NA | NA | NA | NA | 77.02 | 4.37 | 1744480 | 1 | 0 | 0 | 28 | 1955 | 221 | Individual |
| 3300061643_10 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_004-M-20210922 | GUAN | 004 | M | 20210922 | 3300061643 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; UBA4738; UBA4738; HRBIN12; AC-51 | Bacteria | Actinobacteriota | UBA4738 | UBA4738 | HRBIN12 | AC-51 | 76.07 | 2.14 | 2076534 | 0 | 0 | 0 | 36 | 2355 | 274 | Individual |
| 3300061643_15 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_004-M-20210922 | GUAN | 004 | M | 20210922 | 3300061643 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Acidimicrobiia; IMCC26256; PALSA-555 | Bacteria | Actinobacteriota | Acidimicrobiia | IMCC26256 | PALSA-555 | NA | 59.33 | 1.99 | 1787927 | 0 | 0 | 0 | 26 | 2107 | 288 | Individual |
| 3300061643_17 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_004-M-20210922 | GUAN | 004 | M | 20210922 | 3300061643 | MQ | Bacteria; Actinomycetota; Actinomycetes | Bacteria; Actinobacteriota; Actinomycetia; Propionibacteriales; Nocardioidaceae | Bacteria | Actinobacteriota | Actinomycetia | Propionibacteriales | Nocardioidaceae | NA | 58.44 | 6.65 | 2868668 | 1 | 0 | 1 | 24 | 3267 | 618 | Individual |
| 3300061643_26 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_004-M-20210922 | GUAN | 004 | M | 20210922 | 3300061643 | MQ | Bacteria; Actinomycetota | Bacteria; Actinobacteriota; UBA4738; UBA4738 | Bacteria | Actinobacteriota | UBA4738 | UBA4738 | NA | NA | 67.33 | 1.42 | 1697423 | 1 | 0 | 1 | 25 | 1948 | 248 | Individual |
| 3300061643_31 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_004-M-20210922 | GUAN | 004 | M | 20210922 | 3300061643 | MQ | Archaea; Nitrososphaerota; Nitrososphaeria; Nitrososphaerales; Nitrososphaeraceae; Nitrososphaera; Candidatus Nitrososphaera evergladensis | NA | NA | NA | NA | NA | NA | NA | 54.85 | 0.00 | 899286 | 1 | 1 | 1 | 19 | 1154 | 27 | Individual |
| 3300061643_33 | Guanica State Forest and Biosphere Reserve, Puerto Rico | GUAN_004-M-20210922 | GUAN | 004 | M | 20210922 | 3300061643 | MQ | Bacteria; Verrucomicrobiota | Bacteria; Verrucomicrobiota; Verrucomicrobiae; Chthoniobacterales; UBA10450; Udaeobacter | Bacteria | Verrucomicrobiota | Verrucomicrobiae | Chthoniobacterales | UBA10450 | Udaeobacter | 59.59 | 2.36 | 1609121 | 0 | 0 | 0 | 21 | 1734 | 146 | Individual |
NEON_MAGs_GSF %>%
ggplot(aes(x=`Class`))+
geom_bar()+
coord_flip() +
labs(title = "Class Counts at Guanica State Forest")
NEON_MAGs_GSF %>%
ggplot(aes(x=`Order`))+
geom_bar()+
coord_flip() +
labs(title = "Order Counts at Guanica State Forest")
NEON_MAGs_GSF %>%
ggplot(aes(x=`Family`))+
geom_bar()+
coord_flip() +
labs(title = "Family Counts at Guanica State Forest")
NEON_MAGs_GSF %>%
ggplot(aes(x=`Genus`))+
geom_bar()+
coord_flip() +
labs(title = "Genus Counts at Guanica State Forest")
NEON_MAGs_GSF %>%
ggplot(aes(x=`Bin Completeness`, y = `Bin Contamination`))+
geom_point() +
labs(title = "Bin Completeness Values vs Bin Contamination Values at Guanica State Forest")
As mentioned above, the focal site was Guanica State Forest and Biosphere Reserve and the focal phylum was Dormibacterota. The most abundant order at Guanica State Forest was UBA4738, the most abundant genus was DSRY01, the most abundant domain was Bacteria, and the most abundant family was HRBIN12. The top bin lineage count was Bacteria, Actinomycetota and the the top taxonomy lineage was Bacteria, Actinobacteria, Acidimicrobiia, IMCC26256, PALSA-555. The most abundant bacteria at this site was Actinobacteria. Guanica State Forest was characterized as a tropical forest for ecosystem subtype with other subtypes being wetlands, tundra, temperate forest, shrubland, grasslands, desert, and boreal forest. The ecosystems where Dormibacterota were found included shrublands, grasslands, temperate forest, and boreal forest (in order from highest to lowest count). As can be seen in the circular phylogenetic tree, Dormibacterota is a very small phylum. This is further supported by the small number of bases in the data. It seems to prefer warmer temperatures based off of the soil temperature vs ecosystem subtype graph. Dormibacterota were the most abundant in Niwot Ridge, Colorado. With their counts in Yellostone and Denali National Park following closely behind. These results are significant because they provide insight into where Dormibacteriota prefer to live and what organisms are most abundant in Guanica State Forest, Puerto Rico. The research on Dormibacterota is very limited, from what I can find, so these results are good start into the evolving research. This data analysis had some limitations because R was difficult to use at times. The learning process may have altered some of the results since data may not be represented in the best way possible.
This research project looked at taxonomic groups at Guanica State Forest and at Dormibacterota, in particular, across multiple NEON sites. Interesting data was analyzed and presented but further research is definitely needed to understand the diversity of Guanica State Forest and the characteristics of Dormibacterota. I think that a deeper dive into Guanica State Forest would be beneficial to the field of genomics because it would allow researchers to better categorize the species that are there and add details to the broader scope research in this project. Going to the sites where Dormibacterota were found would allow researchers to understand why this phylum prefers to live in those locations and what they may be contributing to the environment there. Overall, this project provides a good summary of Dormibacterota across sites and of the taxnomic breakdown at Guanica State Forest but deeper analysis would provide a great deal of insight into all phyla and NEON sites, leading to a more comprehensive overview.